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Abstract 

DNA nnethylation and chromatin states play key roles in development and disease. However, the extent of recent evolutionary 
divergence in the human epigenome and the influential factors that have shaped it are poorly understood. To determine the links 
between genome sequence and human epigenome evolution, we examined the divergence of DNA methylation and chromatin 
statesfollowing segmental duplication events in the human lineage. Chromatin and DNA methylation states were found to have been 
generally well conserved following a duplication event, with the evolution of the epigenome largely uncoupled from the total number 
of genetic changes in the surrounding DNA sequence. However, the epigenome at tissue-specific, distal regulatory regions was 
observed to be unusually prone to diverge following duplication, with particular sequence differences, altering known sequence 
motifs, found to be associated with divergence in patterns of DNA methylation and chromatin. Alu elements were found to have 
played a particularly prominent role in shaping human epigenome evolution, and we show that human-specific AluY insertion events 
are strongly linked to the evolution of the DNA methylation landscape and gene expression levels, induding at key neurological genes 
in the human brain. Studying paralogous regions within the same sample enables the study of the links between genome and 
epigenome evolution while controlling for biological and technical variation. We show DNA methylation and chromatin divergence 
between duplicated regions are linked to the divergence of particular genetic motifs, with Alu elements having played a dispropor- 
tionate role in the evolution of the epigenome in the human lineage. 
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Background 

Epigenomic features, such as DNA methylation and histone 
modifications, are involved in a number of key cellular 
processes ranging from the regulation of gene expression (Li 
and Reinberg 201 1), to splicing (Shukia et al. 201 1) and the 
repression of transposable elements (Miura et al. 2001). 
Inactivation of the genes controlling DNA methylation in 
mice has been shown to be lethal during early development 
(Okano et al. 1 999) and in humans, aberrant DNA methylation 
and chromatin patterns have been linked to a number of 
human diseases including cancer and various neurodevelop- 
ment disorders (Urdinguio et al. 2009; Campbell and Turner 
2013). 

Despite the clear importance of DNA methylation and 
other chromatin features to development and disease, the 
extent of recent human epigenome evolution and the phe- 
nomena driving such changes remain poorly understood (Bird 
2011; Suetal. 2011). Genome-wide interspecies comparisons 
of DNA methylation and chromatin states, now possible with 



the advent of high-throughput sequencing technologies, have 
provided glimpses of the extent and nature of epigenetic 
divergence between species (Shibata et al. 2012; Zeng et al. 
2012). However, it is still largely unclear what drives this 
divergence. Heritable spontaneous gains or losses of DNA 
methylation have been identified in plants that occur indepen- 
dently of genetic mutations (Schmitz et al. 201 1), suggesting 
that DNA methylation divergence can occur independently of 
the underlying genomic sequence. However, particular 
genetic variants have also been observed to be linked to 
changes in DNA methylation levels within both populations 
(Gibbs et al. 2010) and individuals (Prendergast et al. 2012). 
Similarly, sites of DNA independent DNA methylation variation 
have been shown to be affected by rearrangements in neigh- 
boring regions (Foerster et al. 2011). Beyond the DNA 
sequence itself, it has been proposed that the broad epige- 
nomic context and chromosomal location of a region may also 
play a role in determining DNA methylation states (Lienert 
etal. 2011). 
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Studies investigating the links between underlying genetic 
sequence and the divergence of DNA methylation and chro- 
nnatin have predominantly examined changes between indi- 
viduals or species. However, as the activity of determinants of 
methylation and chromatin states (such as methyltransf erases) 
will differ between samples, these studies are confounded by 
natural biological variation. Likewise, the comparison of 
different samples will also often lead to the introduction of 
technical variation that can be difficult to control for. To cir- 
cumvent these problems, we have investigated how patterns 
of DNA methylation and chromatin variation, and their link to 
underlying DNA sequence divergence, can be studied within a 
single sample. Human segmental duplications, often defined 
as pairs of DNA sequences greater than 1 kb in length which 
align with more than 90% identity (She et al. 2004a), com- 
prise approximately 5% (-150 Mb) of the human genome 
(Marques-Bonet et al. 2009). A previous study has suggested 
that certain histone modifications may vary between dupli- 
cated regions (Zheng 2008), but we lack a comprehensive 
view of epigenomic divergence between duplicons. In partic- 
ular, the extent of divergence in DNA methylation state 
between duplicons and its relationship to changes in the 
underlying sequence remains unknown. 

The study of divergence in the epigenome between para- 
logous regions has the potential to not only uncover the links 
between the evolution of DNA sequence, DNA methylation, 
and chromatin state but also allow us to investigate how 
duplications have potentially contributed to species evolution. 
Segmental duplication events have been an important 
mechanism by which new genes are created and current 
gene families expanded, providing a key mechanism for 
species evolution (De Grassi et al. 2008). Whether functional 
regulatory modules are also maintained and evolve following 
a duplication event has yet to be examined genome wide. If 
DNA methylation and chromatin states are broadly main- 
tained following the duplication of a region then this is likely 
to have implications for the expression of genes in close prox- 
imity to the new duplicon. Alternatively, divergence in chro- 
matin states following duplication potentially provides another 
mechanism for the evolution of a locus and the neo or 
subfunctionalization of a genomic region, beyond simply the 
evolution of the underlying genome sequence. DNA sequence 
is thought to evolve in a relatively clock-like fashion across the 
genome, with the number of changes between duplicons 
expected to increase with increasing time since the duplication 
event. Whether the epigenome evolves in the same way, and 
at similar rates, remains largely unknown. 

In this study, we examined DNA methylation and chroma- 
tin divergence at the tens of thousands of loci that have been 
duplicated across the human genome to investigate the links 
between the evolution of the genome and epigenome in 
unprecedented detail. Human embryonic stem cells were 
used as the primary model in this study, but we validate the 
results in other cell types and extend our observations to 



examine the evolution of the human brain epigenome since 
divergence from chimpanzee. This study provides the first 
comprehensive analysis of DNA sequence, DNA methylation, 
and chromatin divergence across paralogous sites in the 
human genome. 

Results and Discussion 

Widespread Conservation of DNA Methylation and 
Chromatin States Following Duplication 

We first examined how methylation states have evolved 
following a segmental duplication event in the human lineage 
(>1 kb in length and >90% identity). Examination of paralo- 
gous CpG sites in the human genome illustrates that DNA 
methylation levels have been strikingly well conserved 
following a duplication event and the insertion of a homolo- 
gous sequence into a new genomic location. As shown in 
figure 1 both unmethylated and methylated sites overwhel- 
mingly maintain their approximate methylation levels at both 
paralogous copies of a duplicated region. Of 82,692 paralo- 
gous pairs of CpG sites examined in H1 embryonic stem (ES) 
cells, 78.4% displayed an absolute difference of 20% or less in 
methylation levels (permutation P value < 0.01; Spearman's 
rank correlation of 0.23, P<2.2 x 10~^^). High levels of con- 
servation in methylation levels were also observed in the H1- 
derived neural progenitor and IMR90 cell lines (permutation P 
values for both <0.01; Spearman's rank correlations of 0.37 
and 0.48, respectively, both P<2.2 x 10"^^; fig. 1 and sup- 
plementary fig. SI, Supplementary Material online). Both 
methylated and unmethylated sites were observed to gener- 
ally display high levels of conservation (fig. 1). No consistent 
difference was observed between the methylation state of the 
ancestral and derived copies of paralogous CpG sites in these 
three cell types, with the derived copy of interchromosomal 
duplicated CpG sites observed to be as likely as the ancestral 
copy to have a low(<50%), putatively functional, methylation 
state (supplementary fig. S2, Supplementary Material online). 
Perhaps surprisingly DNA methylation divergence was ob- 
served to be largely uncoupled from the mean sequence- 
level divergence in the surrounding region. Although methyl- 
ated CpGs predominate in the human genome, this remained 
the case when only examining paralogous CpG sites with at 
least one lowly methylated locus (<50% methylation, 
Kruskal-Wallis test of association between methylation and 
average sequence divergence as in fig. 2, P=0.31). We 
could also find no evidence that substitutions within close 
proximity of the CpG site were more likely to be associated 
with divergence in methylation levels than those further away. 

This conservation of DNA methylation levels is matched by 
conservation of a wide variety of chromatin features at dupli- 
cated loci. The location of various histone modifications, as 
well as CTCF binding (a chromatin regulator [McDaniell et al. 
2010]) and DNase I hypersensitivity (a marker of functional 



Genome Biol. Evol. 6(7):1 758-1 771 . doi:10.1093/gbe/evu142 Advance Access publication June 24, 2014 



1759 



Prendergast et al. 



GBE 



H1 derived neural progenitor - H1 - ONA metliylatlon 

DNA methylation ^ 




0 9 99 999 9999 0 9 99 999 9999 

Paralogous region 1 (read count} Paralogous region 1 (read count) 



H1 - H2BK15ac H1 - H3K9me3 




0 9 99 0 9 99 999 

Paralogous region 1 (read count) Paralogous region 1 (read count) 



Fig. 1. — Strong conservation of DNA methylation and other chromatin features following a segmental duplication event. Comparative methylation 
levels at paralogous CpG sites in the HI and HI -derived neural progenitor cell lines are shown in the top two panels (corresponding plot for IMR90 shown in 
supplementary fig. SI , Supplementary Material online). Intensity of color corresponds to the density of paralogous pairs of CpG sites with the corresponding 
methylation levels. Densities are rescaled for the subplots displaying the lower areas of the graphs in more detail. The lower four panels show the read counts 
from three ChlP-seq and one DNase-seq experiment found at each pair of 500-bp paralogous regions in the H 1 cell line. Paralogous pairs of windows with no 
reads mapping to either region are excluded from these plots. The corresponding plots for all 23 histone modifications examined can be found in 
supplementary figure S2, Supplementary Material online. 



1 760 Genome Biol. Evol. 6(7):1 758-1 771 . doi:10.1093/gbe/evu142 Advance Access publication June 24, 2014 



Human Epigenome Evolution 



GBE 



regulatory regions [McDaniell et al. 2010]), was all well 
conserved between paralogous regions (fig. 1 and supple- 
nnentary fig. S3, Supplementary Material online). Thus, 
broad chromatin states, reflected in many chromatin features, 
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Fig. 2. — DNA methylation divergence is independent of levels of 
surrounding DNA sequence divergence. Mean difference in methylation 
levels between paralogous CpG sites associated with different levels of 
flanking sequence divergence. The number of single-base substitutions 
was measured in the DNA sequence 500 bp either side of the correspond- 
ing CpG site (or to the end of the duplicated region if closer); 95% 
confidence intervals are shown along with the genome-wide mean differ- 
ence in methylation levels between paralogous CpG sites (red horizontal 
line). No significant difference was observed in the average difference in 
methylation levels between paralogous CpG sites of different levels of 
flanking sequence divergence (P=0.17, Kruskal-Wallis test). 



have generally been well conserved following the insertion of 
a DNA sequence into a new genomic location and higher 
order chromatin environment. 

DNA Methylation and Chromatin Divergence Are Linked 
to Divergence at Specific Local Sequence Motifs 

We next investigated whether where divergence in DNA 
methylation and chromatin states has occurred it is related 
to particular changes in the underlying DNA sequence. If the 
evolution of the epigenome is entirely uncoupled from the 
evolution of the DNA sequence then no particular DNA 
motifs would be expected to show enrichment around 
either the methylated or unmethylated copies of discordantly 
methylated, paralogous CpG sites. However, as shown in 
table 1 , particular motifs were observed to be linked to meth- 
ylation divergence. This included a motif matching the known 
chromatin regulator SP1 (q = 0.001). Further analysis high- 
lighted that the loss of these putative SP1 -binding sites is 
also associated with detectable falls in the observed levels of 
SP1 binding at these regions (fig. 3A). 

As well as the SP1 binding site, a number of other motifs 
were found to be enriched around the hypomethylated copies 
of discordant CpG sites (table 1 ). These include a motif match- 
ing the multiple start element downstream-1 (MED-1) se- 
quence that was found to be associated with divergent CpG 
sites found outside CpG islands across cell types. This motif 
was found around 35.9% of the hypomethylated copies of 
these discordant CpG sites but at only 5.04% of the matching 



Table 1 

Motifs Enriched around the Hypomethylated Copies of the Significantly Discordant (P< 5 x 10" 



Paralogous CpG Sites in Various Tissues 



Motif 


P 


Top Match 


Match 
Score 




Program 


Notes 


H1 cell line (32 pairs of sites) 














YCCCSCCKCCTCM KCCTCCC 


1.80E-25 


SPI 




0.001 


MEME 




KKGSKGKGRRYRSGG 


1.80E-04 


Zfp281 




0.0026 


MEME 


SPI Q = 0.0026 


STYTTYTTTTYY 1 1 1 1 1 1 1 1 


1.40E-29 


MTF1 




0.086 


MEME 




SRSGSSYSAGSCMCCGYSSC 


1.60E-06 








MEME 




TTARDACWGT 


1.00E-12 








Homer 




HI -derived neural progenitor cell 


line (46 pairs of sites) 










TTTYTTWTTY 1 1 1 1 1 YTTTT 


2.50E-43 








MEME 




YYCWCCYKCCYCWSYCYCCC 


6.10E-35 


SPI 




0.043 


MEME 


Zfp281 q = 0.043 


SCRGGCTGGRGTSSRRKGGM 


3.20E-15 








MEME 




YTCYCRAAKTG YTKG KATTA 


1.70E-12 








MEME 




WAWWTTTKTWTKTTTADKWG 


3.80E-11 








MEME 




GYGAGSCASCGCSCCYGGCC 


2.00E-09 








MEME 


SPI q = 0.14 


NFY (RGCCAATSRG) 


NA 


NFY 




0.094 


Homer 


Known motif enrichment result 


Prefrontal cortex primary tissue (22 pairs of sites) 












MWKCYYCYCCYYM MSCCYCC 


8.60E-06 


Zfp281 




9.7 X 10""^ 


MEME 


SPI Q = 0.053 


VVrYVVrTKTMTTKYTTTYTW 


2.10E-05 








MEME 




Sites not in a CpG island (13 pairs from each cell type) 










DGGAGCGCWK 


1.00E-12 


MED-1 


0.73 




Homer 




GGCCCCCA 


1.00E-12 


Zfp281 


0.76 




Homer 





Note. — To ensure each region only appeared once in this analysis, where more than one pair of discordant CpG sites was within 1 kb only one was kept (arbitrarily 
chosen). 
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Fig. 3. — DNA methylation and chromatin divergence are associated 
witli variation in known sequence motifs. (A) Loss of putative SP1 -binding 
site at metliylated copies of discordant paralogous CpG sites is associated 
witli a corresponding lack of SRI binding at these loci. SP1 ChlP-seq read 
depths 500 bp either side of the corresponding methylated and unmethy- 
lated copies of discordant paralogous CpG sites are shown. (B) Divergence 
in chromatin patterns between paralogous regions is consistently associ- 
ated with the loss of transcription factor-binding sites. Only the 1 1 chro- 
matin marks with at least 70 discordant pairs of regions were analyzed, 
with the number of discordant pairs of regions shown in brackets 



hypermethylated copies (P= 1 x 10~^^). The MED-1 sequence 
is a downstream protein-binding element previously linked to 
TATA-less promoters with multiple distinct start sites (Butler 
and Kadonaga 2002). Although, as far as we are aware, not 
previously linked to DNA methylation divergence, mutations 
at this element within the P-glycoprotein promoter have been 
shown to lead to a reduction in transcription of the gene 
through selectively decreasing the use of alternative transcrip- 
tional start sites (TSSs) (Ince and Scotto 1995). Together these 
results suggest that mutations at this element lead to changes 
in methylation levels linked to altered transcription levels. 

The divergence of many chromatin features (various his- 
tone modification levels, CTCF binding, and DNase I hyper- 
sensitivity) between paralogous regions was also found to be 
linked to the divergence in particular transcription factor-bind- 
ing sites (supplementary table S1, Supplementary Material 
online). The "chromatin depleted" copies (those duplicons 
with relatively less of a given chromatin feature) of discordant 
paralogous pairs being consistently associated with a relative 
lack of known protein-binding motifs (fig. 3B). These included 
motifs known to be associated with particular chromatin 
states. For example, the known CTCF-binding site was 
found to be substantially depleted from the copies of para- 
logous regions lacking CTCF binding (supplementary table S1, 
Supplementary Material online), validating this approach for 
detecting sequence motifs linked to chromatin states; how- 
ever, various other transcription factor motifs were observed 
to be linked to the divergence in particular chromatin marks. 
The full list of motifs linked to the divergence of each chro- 
matin state can be seen in supplementary table S1, 
Supplementary Material online. The general link observed be- 
tween transcription factor binding and chromatin divergence 
supports the concept of pioneer transcription factors (Zaret 
and Carroll 2011), whose initial binding at a region enables 
subsequent chromatin remodeling and the recruitment of his- 
tone modification enzymes. The approach presented here 
provides an indication of which transcription factors are 
most strongly linked to variation in particular chromatin fea- 
tures and might therefore act as pioneers in the HI cell type. 

Distal Regulatory Regions Have Been Foci for DNA 
Methylation and Chromatin Divergence 

Although DNA methylation levels have previously been shown 
to be correlated with local CpG content (Gaidatzis et al. 201 4), 
strong conservation of DNA methylation levels was observed 
following a duplication event irrespective of local CpG density 



Fig. 3. — Continued 

following the modification name. A corresponding list of each of the 
known sequence motifs enriched at either the chromatin enriched or 
depleted copies of paralogous regions can be found in supplementary 
table 51, Supplementary Material online. 
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(supplementary fig. S4, Supplementary Material online). 
However, the subset of CpG sites in the H1 cell line most 
discordant in their methylation levels with respect to their 
paralogous copy (654 sites with a methylation difference 
>80%) were found to generally locate to CpG islands (8% 
of sites) or CpG island "shores" (51 % of sites, defined here as 
less than 2 kb from the neighboring CpG island), regions im- 
portant in gene regulation, and disease (Doi et al. 2009; 
Irizarry et al. 2009). Thus, methylation divergence is seen at 
regions where methylation levels are known to be particularly 
important to regulatory function and is not simply restricted to 
CpG poor, putatively nonfunctional regions of the genome. 
This is in contrast to sequence divergence, where elevated 
rates of divergence are often seen at nonfunctional regions 
of the genome lacking selective constraint. 

CpG islands are often associated with gene promoters; 
however, examination of the proximity to promoter regions 
of the most significantly discordant CpG sites in the H1 cell line 
(53 pairs of sites with P< 5 x 10~^, Fisher's exact test; sup- 
plementary table S2, Supplementary Material online) revealed 
that they were found almost exclusively distal to known TSSs 
(fig. 4). In stark contrast, methylation levels at CpG sites within 
1 kb of promoter regions are strongly conserved following a 
duplication event (fig. 4). The specific conservation of meth- 
ylation levels at proximal promoter regions is consistent with a 
more focused study of methylation levels at ten mouse pro- 
moter regions (Lienert et al. 201 1 ). Thus, although DNA meth- 
ylation divergence is linked to CpG islands, those CpG sites 
close to promoters are generally well conserved, it is the sites 
at more distal CpG dense regions showing the highest levels 
of divergence. 

To investigate the regulatory potential of these distal re- 
gions showing methylation divergence, we analyzed the oc- 
currence of 23 histone modifications around these 53 most 
discordant CpG sites. Those associated with active regulatory 
regions, including H3K4me2, H3K4me3, and H3K9ac, were 
indeed significantly enriched around the unmethylated copies 
of discordant paralogous CpG sites, and depleted around the 
corresponding methylated copies (fig. 5 and supplementary 
fig. S5 and table S3, Supplementary Material online). These 
patterns are observed in spite of these chromatin features 
being generally well conserved following a duplication event 
(fig. 1 and supplementary fig. S2, Supplementary Material 
online). Other histone modifications, including H3K9me3 
and H3K36me3, that are not preferentially found at regula- 
tory regions, displayed no relative enrichment around methyl- 
ated or unmethylated copies of discordant CpG pairs 
(supplementary fig. S5 and table S3, Supplementary 
Material online). In addition, the unmethylated copies of dis- 
cordant CpG sites were substantially enriched for CTCF bind- 
ing and DNase I hypersensitivity, general markers of functional 
regulatory regions (fig. 5 and supplementary fig. S5 and table 
S3, Supplementary Material online). We conclude that the 
divergence of DNA methylation levels between duplicons is 



+ Both CpG srles unmethylated < Discordant CpG sites 
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10 1000 100000 

Paralogous site 2 - distance to closest TSS (bp) 

Fig. 4. — Elevated rates of divergence in methylation levels at pairs of 
paralogous CpG sites distal to TSSs. Distance to nearest TSSs of paralogous 
pairs of CpG sites completely unmethylated on both copies (red) and pairs 
of paralogous CpG sites significantly different in their methylation levels 
(blue). Contour lines correspond to a 2D kernel density estimate for each 
group of points highlighting the separate clustering of pairs of discordant 
paralogous CpG sites and paralogous sites unmethylated on both copies. 
Discordant pairs of CpG sites are generally greater than 1 kb from the 
nearest TSS. 



associated with the evolution of other chromatin features, 
consistent with the emergence or destruction of distal regu- 
latory regions in the human genome. It is consequently the 
small fraction of the epigenome at functional distal regulatory 
regions that appear to have evolved most rapidly in the human 
lineage. 

Sites Differentially Methylated during Differentiation Are 
Particularly Prone to Methylation Divergence during 
Evolution 

Many functional sites in the genome undergo transitions in 
DNA methylation during cellular differentiation and are 
thought to modulate regulatory interactions and transcription 
(Mohn and Schubeler 2009; Ong and Corces 201 1). How are 
these sites, implicated in development and cancer (Jones 
2012), related to sites showing evolutionary divergence in 
DNA methylation? To test this, we examined whether the 
difference in methylation levels between the 82,692 pairs of 
paralogous CpG sites in HI ES cells was correlated to the 
observed change in methylation levels of the same sites be- 
tween HI and the HI -derived neural progenitor cell types. As 
can be seen in figure 6, in general, the larger the observed 
change in methylation of a CpG site following differentiation 
(i.e., between cell types), the larger the difference in methyl- 
ation levels between the same CpG site and its paralogous site 
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Fig. 5. — Divergence of methylation state at distal regions is associated with divergence in regulatory chromatin features. The top three panels show the 
total ChlP-seq/DNase-seq reads for three chromatin marks found 500 bp either side of the methylated and unmethylated copies of discordant pairs of CpG 
sites. Read counts are significantly higher around the unmethylated sites (corresponding plots for all 25 chromatin features examined with associated P values 
shown in supplementary fig. S5 and table S3, Supplementary Material online). Corresponding plots with the window size increased to 500 kb either of the 
CpG sites, illustrating this is a local effect and not a broader feature of the genomic regions, are shown in supplementary figure S11 , Supplementary Material 
online. The bottom three panels display the read depths for the same chromatin marks at nondiscordant paralogous CpG sites completely unmethylated on 
both copies (CpG site labeled as "second" site being randomly chosen). Pairs of sites in each panel are sorted separately according to read counts around the 
unmethylated/second CpG sites. 
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Fig. 6. — Sites of cell type-specific methylation are particularly prone 
to divergence following duplication. The observed difference in methyla- 
tion level to their paralog of CpG sites within the HI cell line (y axis). Sites 
are grouped by their observed methylation level in HI (x axis) and their 
observed change in methylation following differentiation (colored bins). 
The cutoffs for the bins were selected, so that each category contained 
approximately the same number of CpG sites. 



(i.e., within the same cell type). This suggests that sites 
showing regulated alterations in methylation during differen- 
tiation are also particularly prone to diverge following dupli- 
cation in embryonic stem cells. The direction of change in 
methylation levels is generally the same between duplicated 
copies as between cell types (fig. 6) highlighting that these 
sites do not simply show higher variability in their methylation 
levels. 

The extent of divergence in methylation levels following 
differentiation and between paralogous sites was found to 
be largest at sites unmethylated and lowly methylated in the 
H1 stem cell line (<50% methylated, fig. 6). A simple linear 
model incorporating the methylation level of each individual 
CpG site and its observed change in methylation levels follow- 
ing differentiation was found to be sufficient to explain a sub- 
stantial proportion of the variation in the observed differences 
in methylation levels between paralogous CpG sites {R^: 0.33, 
Consequently, the methylation levels of sites 
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of cell-type-specific methylation are particularly prone to di- 
verge following a segmental duplication event and subse- 
quent DNA sequence divergence. 

Alu Elements Are Associated with DNA Methylation 
Divergence at Flanking Sites 

Despite the observed links with DNA sequence motifs, one of 
the strongest correlates to DNA methylation divergence be- 
tween paralogous CpG sites was found to be discordance in 
the distance to the nearest Alu element. The hypermethylated 
copy of the 53 discordant pairs of paralogous CpG sites in the 
H1 cell line was found to in general be significantly closer to an 
Alu element than their corresponding hypomethylated copy 
(supplementary fig. S6, Supplementary Material online). The 
other major repeat classes displayed no similar enrichment 
around either the methylated or unmethylated copies of dis- 
cordant paralogous CpG pairs (LINE P=0.85, LTR P=0.74, 
simple repeat P=0.69 — paired Mann-Whitney (7 tests) sug- 
gesting that methylated sites in discordant pairs are not simply 
associated with regions densely populated by repeat ele- 
ments, which might be expected at regions simply under 
less evolutionary constraint, but are specifically associated 
with Alu element insertion events. Alu elements have been 
implicated in the creation of segmental duplication events and 
are often found at the junctions of duplicated regions (Bailey 
et al. 2003). However, we could find no evidence that discor- 
dant CpG pairs were simply closer to junctions than nondis- 
cordant CpG sites (discordant CpG sites median distance to 
junction: 364 bp and nondiscordant CpG sites median dis- 
tance to junction: 341 bp; Mann-Whitney U test P=0.13). 
These data are consistent with previous proposals that certain 
transposable elements may play functional roles in regulation 
as a result of their general high levels of methylation affecting 
the methylation state of nearby CpG sites (Wang et al. 201 1). 
The results presented here suggest Alu elements may have 
played a substantial role in the evolution of the epigenome 
in the human lineage. Of the 32 pairs of paralogous regions 
containing CpG sites discordant in their methylation levels 
(i.e., the 32 pairs of regions containing the 53 significantly 
discordant pairs of CpG sites), the methylated copies in each 
pair were closer to an Alu element in 22 cases (with four pairs 
showing no difference in the distance to an Alu element). Of 
the six remaining pairs of regions, the unmethylated copy was 
substantially closer (>15bp) to an Alu element in only two 
cases. Consequently, the methylated copies of CpG sites that 
have diverged following duplication are highly enriched for 
proximity to Alu elements. 

The loss of SP1 binding and the close proximity of an 
Alu element were observed to often co-occur at regions di- 
vergent in their methylation levels. Of the 22 discordant pairs 
of CpG sites where the methylated copy was closer to an Alu 
element, the corresponding SP1 ChlP-seq read count was also 
lower at the methylated copy in 19. Although methylation 



levels at TSSs were observed to be generally relatively stable, 
one of the few sites of divergence in methylation levels at a 
TSS (a duplicated CpG island at the 5^-end of the TPTE 
and LOC400927 genes) is linked to divergence in both 
the proximity to the closest Alu element as well as SP1 bind- 
ing (supplementary fig. S7, Supplementary Material online). 
A corresponding large change in expression is observed 
between these genes, with LOC400927 being expressed 
at approximately 30 times the level of TPTE in the HI 
cell line (LOC400927 reads per kilobase per million reads 
[RPKM]: 1.79 and TPTE RPKM: 0.061). It may be that 
methylation levels are generally well conserved following 
a duplication event because such "multiple hits" are re- 
quired to substantially remodel the methylation levels at a 
region. 

Alu Element Insertions Are Linked to the Remodeling of 
Methylation Patterns in the Human Brain 

To investigate further how Alu elements have potentially 
shaped key phenotypes in humans through affecting the evo- 
lution of the human epigenome, we looked at how these 
findings from paralogous regions translated to the whole 
genome and key methylation differences between humans 
and chimpanzees by characterizing the location of human- 
specific Alu insertions and their link to methylation divergence 
in primary human brain tissue. In total, we identified 4,435 Alu 
elements present in the human genome but absent from the 
corresponding orthologous regions of the chimpanzee and 
orangutan genomes. Average methylation levels at conserved 
orthologous CpG sites flanking these human Alu element 
insertion sites were found to be significantly higher in 
human prefrontal cortex samples than in matched chimpan- 
zee samples (fig. 7A). This elevation in human methylation 
levels at sites flanking human-specific Alu insertion events 
was observed for sites both methylated and unmethylated 
in the chimpanzee genome. Examination of chimpanzee-spe- 
cific insertions highlighted that these are also associated with 
increases in flanking methylation levels in the chimpanzee 
genome (fig. 7B). Subdivision of the human Alu insertions 
into families highlighted that the increase in flanking methyl- 
ation levels is predominantly related to AluY insertions, the 
most active family in recent primate history (Chimpanzee 
Sequencing and Analysis Consortium 2005), with no observ- 
able changes in flanking DNA methylation levels linked to the 
less common AluJ and AluS insertions (fig. 7C-F). We con- 
clude that not only at paralogous regions but also across the 
genome AluY element insertions in the human lineage have 
been linked to the remodeling of local methylation patterns, 
including in the human brain. 

Examination of the differences in expression levels among 
human, macaques, and chimpanzees where a human-specific 
Alu element insertion in close proximity to a gene promoter is 
linked to an increase in methylation illustrates that this is 
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Fig. 7. — ^Alu elements are linked to DNA methylation divergence at flanking sites. Elevation in prefrontal cortex methylation levels at CpG sites in close 
proximity to Alu insertion events. Only orthologous CpG sites present in both species were retained. Sites were grouped into 500 bp windows with a 1 00-bp 
offset. Mean methylation levels for each window and corresponding 95% confidence intervals are shown. Panels correspond to methylation levels around 
sites of (A) Alu insertions in the human lineage (paired Mest comparing corresponding methylation levels within 1 kb of insertions site in human and 
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generally linked to a lower expression of the correspond- 
ing genes in hunnans relative to chinnpanzees (fig. 8). 
Consequently, these changes in methylation levels linked to 
an Alu element insertion are linked to downstream changes in 
gene expression. Examination of the location of these CpG 
sites that have diverged in their methylation levels following 
the insertion of an Alu element in close proximity (relative 
to those sites where methylation levels have not changed 
following an Alu insertion nearby) shows that they are sig- 
nificantly enriched in regions harboring genes with neural 



functions, including those involved in neurotransmitter trans- 
port, synapse function, and insulin secretion (supplementary 
fig. S8, Supplementary Material online). Thus, Alu element 
insertion events in the human lineage appear to be 
directly linked to the remodeling of methylation levels 
around regulatory regions involved in key brain pathways 
are linked to interspecies changes in gene expression and 
may have consequently contributed to some of the key 
phenotypic differences between humans and our closest 
relatives. 
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Fig. 8. — Alu insertions are linked to interspecies gene expression 
changes. Genes displaying a change in methylation at their promoter fol- 
lowing a human-specific Alu insertion in close proximity also dispropor- 
tionately display a corresponding lower expression level in humans. 



Conclusions 

The evolution of the human genome sequence has been in- 
tensively studied over the past decade, providing numerous 
important insights into the evolution of the human lineage. 
However, despite their substantial importance to various traits 
and diseases, comparatively little is known about how DNA 
methylation and chromatin states evolve. Through the study 
of many tens of thousands of paralogous CpG sites and 25 
chromatin marks, we have shown that DNA methylation and 
chromatin levels are surprisingly well conserved following seg- 
mental duplication events. Following the duplication of CpG 
islands and regulatory regions, and their insertion into a new 
genomic location, there is in general little divergence in DNA 
methylation levels or patterns of chromatin. This implies that 
intact regulatory modules have been copied to a new location 
in the human genome, and a new genomic neighborhood, 
while maintaining their original spectrum of DNA methylation 
and chromatin states. It has already been shown in vitro that 
the insertion of genes adjacent to previously distant regulatory 
regions can affect their expression patterns (Weiler and 
Wakimoto 1995). Just as gene duplication is now regarded 
as a key substrate for genome evolution, this duplication of 
functional, regulatory modules is likely to have provided a rich 
source of phenotypic variation. 

Where divergence in methylation patterns did occur, it was 
observed to be largely uncoupled from the average rate of 
divergence of the surrounding DNA sequence. The gross levels 



GBE 



of genomic and epigenomic divergence at a locus appear to 
be largely independent. Under the neutral theory of molecular 
evolution (Kimura 1989), the amount of DNA divergence 
between two paralogous regions should be approximately 
related to the time since duplication of the ancestral region. 
In contrast, it appears that methylation divergence between 
paralogous CpG sites is largely unlinked to the time since the 
corresponding duplication event. This argues against DNA 
methylation levels evolving in a neutral, clock-like fashion. 
Consistent with this, methylation divergence is enriched at 
CpG island and shore distal regulatory regions where DNA 
methylation levels are known to be functionally important 
(Doi et al. 2009; Irizarry et al. 2009). Protein-coding DNA se- 
quences have been shown to sometimes experience unusually 
high levels of positive selection following duplication (Zhang 
2003). Analogously, the elevated rate of DNA methylation- 
level divergence at functional regions relative to other CpG 
sites may be indicative of positive selection acting on the 
methylation state of duplicated regulatory regions. 

The mechanisms underlying evolutionary divergence of the 
epigenome have until now been poorly characterized. 
Although DNA methylation and chromatin divergence were 
observed to be largely uncoupled from the average sequence 
divergence between paralogous regions, in this study we have 
shown that divergence in many features of chromatin struc- 
ture between two paralogous regions is linked to divergence 
at particular DNA sequence motifs. For example, unmethy- 
lated copies of discordant CpG sites were preferentially asso- 
ciated with a GC basepair-rich motif matching the known 
binding site of a key chromatin regulator, SP1 . Data directly 
assaying SRI binding at these loci confirmed that binding was 
preferentially associated with the unmethylated copies of dis- 
cordant CpG sites. Artificial mutations in the SP1 motif of the 
mouse Gtf2d1l promoter have previously been shown to be 
associated with loss of neighboring CpG methylation (Lienert 
et al. 201 1). We have shown here that the evolution of this 
and other key binding motifs has been linked to the diver- 
gence of methylation levels across a range of locations in the 
human genome. 

Particular DNA-binding motifs were also observed to have 
diverged between regions discordant for particular chromatin 
marks. In particular, discordance in the presence of particular 
histone modifications between paralogous regions was ob- 
served to be linked to divergence in the motifs for particular 
transcription factors. This suggests that the loss of key tran- 
scription factor-binding motifs leads to the loss of binding of 
the corresponding transcription factor at the region and the 
loss of the subsequent recruitment of the corresponding chro- 
matin mark, supporting the concept of pioneer transcription 
factors (Zaret and Carroll 2011). Single-nucleotide polymor- 
phisms at transcription factor motifs have recently been linked 
to chromatin divergence (Kasowski et al. 2013), in agreement 
with these findings. However, such population-based studies 
not only suffer from biological and technical variation but also 
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require the assaying of chromatin states across multiple indi- 
viduals. Given the links between chromatin and transcription 
factor motifs is likely to differ between cell types, using such 
population-based approaches to identify pioneer factors for 
each cell type can be expensive. Here, we show putative pio- 
neer factors can potentially be identified within a single 
sample through the study of paralogous regions. 

The majority of DNA methylation and chromatin diver- 
gence was observed to occur at distal regulatory regions. 
Although DNA methylation states at promoter regions have 
been highly conserved, these distal regions appear to have 
been the main reservoirs of epigenome divergence in the 
human lineage. Such distal regulatory regions have previously 
been shown to display more cell type-specific patterns of DNA 
methylation (Ziller et al. 2013), and we observed that sites of 
cell type-specific methylation were also more likely to diverge 
following a duplication event. 

Alu elements were observed to be preferentially enriched 
around methylated copies of discordant paralogous CpG sites. 
A third of all CpG sites in the human genome is located within 
an Alu element, and it has been shown that the methylation 
of these CpG sites can be transcriptionally repressive and 
increase their mutation rate (due to the high deamination 
rate of methylated cytosines) ultimately leading to the loss 
of their activity (Cordaux and Batzer 2009). A "seed and 
spread model" has been proposed to explain the observed 
patterns of DNA methylation in the genome where methyla- 
tion at one region can spread to neighboring sites (Turker 
2002; Zhang et al. 2012). It has also been shown that repet- 
itive elements can potentially act as seeds, with the insertion of 
repetitive elements adjacent to the INSL6 promoter leading 
to the de novo methylation of specific CpG sites at the 
region (Zhang et al. 201 2). The results presented here support 
the hypothesis that close proximity to repetitive elements can 
alter the methylation state of nearby sites and that Alu 
elements have substantially shaped the evolution of the 
human epigenome. Transcription factor binding appears 
broadly to abrogate DNA methylation at a local region, 
whereas the presence of Alu elements is associated with 
increased methylation levels. We highlight that both factors 
can occur together to reshape the epigenome at regions of 
otherwise generally strong methylation conservation. These 
results support the model that DNA methylation can spread 
from seed regions, such as Alu elements, but be blocked by 
barriers such as transcription factor binding (Turker 2002; 
Zhang et al. 2012). We have shown that recent Alu element 
insertions in the human genome are linked to the remodeling 
of local methylation patterns in human brain cells, with sites of 
Alu-associated methylation remodeling preferentially linked to 
interspecies differences in gene expression and regions asso- 
ciated with key neurological pathways. It is important to note 
that methylation divergence is not at the Alu element itself, 
but at conserved sites often hundreds of basepairs from the 
insertion site that diverge in their methylation levels. These 



results suggest Alu elements are not always neutral or patho- 
genic additions to the human genome but may have 
driven key changes in the human epigenome, leading to im- 
portant phenotypic differences between humans and other 
primates. 

A substantial literature attests to the importance of gene 
duplication and the divergence of sister copies during the 
evolution of protein-coding genes (Zhang 2003). Here, we 
provide evidence of analogous processes acting at the level 
of DNA methylation and chromatin structure to affect regu- 
latory evolution across the human genome. We show that 
regulatory modules (particularly at promoters) can be 
copied, inserted into new chromosomal environments, and 
usually maintain their original chromatin states. On the 
other hand, particular duplicated distal regulatory elements 
have diverged to adopt different chromatin states and pre- 
sumably different functions. The mechanisms underlying this 
chromatin divergence appear to be linked to surprisingly spe- 
cific sequence-level changes, underlining the interplay of 
genome and epigenome in recent human evolution. 

Materials and Methods 

Data Sets 

The locations and alignments of human segmental duplication 
events greater than 1 kb in length and over 90% identical 
between regions were obtained from http://humanparalogy. 
gs.washington.edu/build36/align_both/ (last accessed July 1, 
2014) (She et al. 2004b). The distribution of sizes of these 
duplicated regions is shown in supplementary figure S9, 
Supplementary Material online, and the genomic preferences 
of segmental duplications have previously been documented, 
with many showing an association to regions of known chro- 
mosomal instability and rearrangement such as those at sub- 
telomeric and pericentromeric regions (Bailey et al. 2001 ; She 
et al. 2004b). They are also enriched within relatively gene rich 
chromosomes (Bailey et al. 2002). In total, 159 Mb of the 
genome is involved in at least one of these duplication 
events. Assuming neutrality and a molecular clock, this is the 
fraction of the human genome that has undergone duplication 
within the past 35 Myr of primate evolution (Bailey and Eichler 
2006). The HI, HI -derived neural progenitor, and IMR90 
whole-genome bisulfite sequencing data sets were obtained 
from the National Institute of Health (NIH) Roadmap 
Epigenomics project (http://www.ncbi.nlm.nih.gov/geo/ 
roadmap/epigenomics/?view=matrix, last accessed July 1, 
2014.) (Lister et al. 2009). Histone modification, transcription 
factor ChlP-seq, and DNase-seq data were obtained from a 
combination of both the Encyclopedia of DNA Elements 
(ENCODE Project Consortium et al. 2012) and NIH 
Epigenomics Roadmap (Bernstein et al. 2010) projects. A 
full list of the histone modification data sets used in this 
study can be found in the following file (http://datashare.is. 



1 768 Genome Biol. Evol. 6(7):1 758-1 771 . doi:10.1093/gbe/evu142 Advance Access publication June 24, 2014 



Human Epigenome Evolution 



GBE 



ed.ac.uk/bitstream/handle/1 0283/239/1 756-8935-5-6-s6.xlsx, 
last accessed July 1, 2014). 

Read Mapping 

To enable the accurate study of chronnatin at segnnentally 
duplicated regions only reads that could be unannbiguously 
assigned to a single region in the reference genonne were 
included in all analyses in this study. ChlP-seq reads were 
first trimnned to the first base whose quality was 20 or 
below using FastX-Toolkit (http://hannonlab.cshl.edu/fastx_ 
toolkit/index.html, last accessed July 1, 2014) to remove 
low-quality read sections. Reads were then nnapped to the 
hgIB reference genonne using bowtie (Langmead et al. 
2009) with the -e 1 and -nn 1 parameters ensuring only 
reads that uniquely mapped to one region with no mis- 
matches were retained. Whole-genome bisulfite sequencing 
reads were mapped to the reference genome using Bismark 
(Krueger and Andrews 201 1). Bismark fully bisulfite converts 
sequence reads and maps each to bisulfite converted versions 
of the reference genome, with only reads producing a unique 
best alignment to a region being kept. Reads that contained 
any mismatches to the genome that could not be attributed to 
bisulfite conversion (C->T or G->A) on the appropriate strand 
were discarded. HI RNA-seq data from the ENCODE and NIH 
epigenome roadmap projects were obtained and analyzed as 
previously described (Prendergast et al. 2012). 

DNA Methylation Analysis 

The location of duplicated CpG sites on both strands of the 
reference sequence was first determined, and these sites were 
then stringently filtered according to the following criteria. 
Sites where a known polymorphism (dbSNP 135) overlapped 
the cytosine in a CpG site, or was located at either flanking 
base, were excluded. Likewise, any CpG sites overlapped by 
reads carrying alternative alleles (excluding those expected 
from bisulfite conversion) were excluded, and all CpG sites 
had to be covered by at least one read supporting the pres- 
ence of a cytosine at the corresponding position. The number 
of bisulfite converted and unconverted reads overlapping 
each CpG site was counted, and sites where either CpG site 
was covered by less than six reads were excluded (HI median 
depth at paralogous CpG sites: 5, mean depth: 8.13, standard 
deviation [SD]: 15.27; Hinp median: 7, mean: 8.91, SD: 
11.69; and IMR90 median: 6, mean: 8.75, SD: 16.31). Sites 
where total read coverage across the two sites exceeded 100 
were also excluded (to exclude sites displaying significant but 
only marginal differences between paralogous regions). 
Having applied these filters 82,692, 127,187, and 85,966 
paralogous pairs of CpG sites remained in the HI, HI -derived 
neural progenitor, and IMR90 data sets, respectively. To assess 
whether the concordance between methylation levels of para- 
logous CpG sites was more than would be expected by 
chance CpG sites were randomly shuffled between all 



duplicated regions 100 times. In all permutations the propor- 
tion of sites displaying a<20% difference in methylation 
levels was lower than that observed in the unpermuted 
data. Only paralogous pairs of CpG sites with a P value smaller 
than 5 x 10~^ (Fisher's exact test) were deemed to be signif- 
icantly differentially methylated (corresponding approximately 
to a Bonferroni-corrected Pvalue of 0.05 in each analysis). The 
ancestral and derived copies of interchromosomal paralogous 
CpG sites were determined by lifting both sites over to the 
PanTro4 chimpanzee genome. If both sites lifted over to the 
same chromosome, the site on the syntenic human chromo- 
some was determined to be the ancestral copy. The ancestral 
copy of 383 (1 1 .9%), 987 (1 2.8%), and 2,065 (9.7%) pairs of 
interchromosomal paralogous CpG sites with at least one 
copy with a methylation level less than 50% in the HI, Hi- 
derived neural progenitor, and IMR90 cell lines, respectively, 
were successfully determined in this way. 

Linear Modeling 

The relationship between methylation changes between cell 
types and the divergence in methylation levels observed be- 
tween paralogous sites was modeled using multiple linear re- 
gression. The change in methylation levels observed between 
the HI and HI -derived neural progenitor cell lines and the 
methylation level of the same site in the HI cell line were 
fitted as explanatory variables along with an interaction 
term, that is, the equation was of the form: 

>1 = Po + PiXn + PiX,2 + PiXnX,2 + ^1 , 

where corresponds to the observed methylation level of 
the /-th CpG site in the HI cell line, X,2 is the observed meth- 
ylation change between the HI and HI -derived neural pro- 
genitor cell lines of the same /-th site, and Y,- corresponds to 
the /-th sites observed difference in methylation level to its 
paralogous site. Both variables and the interaction term 
were highly significantly linked to the corresponding sites dif- 
ference in methylation to its paralogous site within the HI cell 
line (P<2.2x 10"^^). 

ChlP-Seq Analysis 

Conservation of chromatin patterns between paralogous 
regions was determined from ChlP-seq data by first exclud- 
ing pairs of aligned bases to which reads could not be 
uniquely mapped in both regions. Mapped reads less 
than 35 bp were discarded and regions of 35 bp that 
were not unique in the genome were identified using 
the wgEncodeDukeUniqueness35bp table from the UCSC 
genome browser (Kent et al. 2002). Corresponding sites 
that were not unique at either duplicated copy of a region 
were ignored, and the number of reads mapping to the 
remaining positions at each nonoverlapping 500 bp region 
counted. Regions of discordant chromatin state were identi- 
fied using a binomial test. Only pairs of paralogous sites with a 
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Bonferroni-corrected P value less than 0.05 and where one 
copy of the region had a read count of zero (to restrict to sites 
where binding has been completely lost on one copy) were 
used in the analysis of the divergence of underlying DNA 
motifs. 

Motif Analysis 

DNA motifs enriched 500 bp either side of discordant CpG 
sites were identified using MEME (Bailey and Elkan 1994) 
and HOMER (Heinz et al. 2010). To ensure each region only 
appeared once in this analysis, where more than one of the 
discordant CpG sites was within 1 kb only one (arbitrarily 
chosen) paralogous pair of sites was kept. Following this fil- 
tering, 32 pairs of sites remained in the HI analysis, 46 in the 
HI -derived neural progenitor data set, and 22 in the study of 
the human prefrontal cortex. To identify motifs linked to 
methylation divergence at sites outside CpG islands across 
tissues, 13 pairs of sites were randomly selected from each 
data set (1 3 being the number of pairs of CpG sites not linked 
to a CpG island in the prefrontal cortex data set, the smallest 
number of pairs across these data sets). Discriminative motif 
discovery was performed by providing the regions around the 
methylated and unmethylated sites separately, and reversing 
the background and foreground sets to discover motifs en- 
riched around both groups of sites. Locations of repeats in the 
human genome were obtained from the UCSC genome 
browser (Kent et al. 2002). 

The HOMER program (Heinz et al. 2010) was used to de- 
termine motifs that had diverged between paralogous regions 
discordant in their chromatin states. Only histone modifica- 
tions with at least 70 discordant regions could be successfully 
analyzed with HOMER. Known sites with a corresponding 
Benjamini q value less than 0.01 were treated as enriched 
between discordant regions. 

To detect an enrichment of repeat elements around meth- 
ylated copies of discordant CpG pairs, the locations of repeats 
in the human genome were obtained from the RepeatMasker 
track at the UCSC genome browser (Kent et al. 2002). 

Primate Brain Methylation Divergence 

Human-specific Alu insertions were characterized by identify- 
ing Alu sequences present in the human genome but absent 
from the orthologous region of the chimpanzee and orangu- 
tan genomes in the UCSC genome browser chained align- 
ments (Kent et al. 2002). The same approach was used to 
identify chimpanzee-specific insertions. In total 4,435 
human-specific and 1,882 chimpanzee-specific Alu insertion 
events were identified. BS-seq data corresponding to three 
human and three chimpanzee prefrontal cortex samples 
were obtained from Zeng et al. (2012). Reads were mapped 
to the hg19 and PanTro3 genomes using Bismark (Krueger 
and Andrews 2011) with duplicate reads subsequently 
removed. Data were combined across the three replicates 



for each species and the level of methylation for each CpG 
site conserved across both species was determined. Sites with 
a combined depth of less than five reads in either species were 
excluded. Gene level, processed RNA-seq expression data for 
human, chimpanzee, and macaque (Zeng et al. 2012) were 
obtained from http://www.ncbi.nlm.nih.gov/geo/query/acc. 
cgi?acc=GSE33587 (last accessed July 1, 2014). 

Functional enrichment analysis was performed using 
GREAT (McLean et al. 2010) by comparing all CpG sites of 
low methylation in the chimpanzee genome (<40% methyl- 
ated) within 2 kb of a human-specific Alu insertion site, to 
those subset of these sites where the human methylation pro- 
portion has increased by an absolute proportion of at least 0.6 
relative to the chimpanzee methylation level (i.e., had gone 
from a low- to a high-methylation level). This allowed us to 
identify regions where the methylation level showed evidence 
of having been substantially remodeled following the Alu 
insertion. 

Supplementary Material 

Supplementary figures SI-S1 1 and tables S1-S3 are available 
at Genome Biology and Evolution online (http://www.gbe. 
oxfordjournals.org/). 
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